An empirical evaluation of set similarity join techniques
نویسندگان
چکیده
منابع مشابه
An Empirical Evaluation of Set Similarity Join Techniques
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant num...
متن کاملSet Similarity Join on Probabilistic Data
Set similarity join has played an important role in many real-worldapplications such as data cleaning, near duplication detection, dataintegration, and so on. In these applications, set data often con-tain noises and are thus uncertain and imprecise. In this paper, wemodel such probabilistic set data on two uncertainty levels, that is,set and element levels. Based on them, w...
متن کاملScalable and robust set similarity join
Set similarity join is a fundamental and wellstudied database operator. It is usually studied in the exact setting where the goal is to compute all pairs of sets that exceed a given similarity threshold (measured e.g. as Jaccard similarity). But set similarity join is often used in settings where 100% recall may not be important — indeed, where the exact set similarity join is itself only an ap...
متن کاملLeveraging Set Relations in Exact Set Similarity Join
Exact set similarity join, which finds all the similar set pairs from two collections of sets, is a fundamental problem with a wide range of applications. The existing solutions for set similarity join follow a filtering-verification framework, which generates a list of candidate pairs through scanning indexes in the filtering phase, and reports those similar pairs in the verification phase. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2016
ISSN: 2150-8097
DOI: 10.14778/2947618.2947620